Data Set Balancing
نویسنده
چکیده
This paper conducts experiments with three skewed data sets, seeking to demonstrate problems when skewed data is used, and identifying counter problems when data is balanced. The basic data mining algorithms of decision tree, regression-based, and neural network models are considered, using both categorical and continuous data. Two of the data sets have binary outcomes, while the third has a set of four possible outcomes. Key findings are that when the data is highly unbalanced, algorithms tend to degenerate by assigning all cases to the most common out come. When data is balanced, accuracy rates tend to decline. If data is balanced, that reduces the training set size, and can lead to the degeneracy of model failure through omission of cases encountered in the test set. Decision tree algorithms were found to be the most robust with respect to the degree of balancing applied.
منابع مشابه
Multi-objective scheduling and assembly line balancing with resource constraint and cost uncertainty: A “box” set robust optimization
Assembly lines are flow-oriented production systems that are of great importance in the industrial production of standard, high-volume products and even more recently, they have become commonplace in producing low-volume custom products. The main goal of designers of these lines is to increase the efficiency of the system and therefore, the assembly line balancing to achieve an optimal system i...
متن کاملAssembly line balancing to minimize balancing loss and system loss
Assembly Line production is one of the widely used basic principles in production system. The problem of Assembly Line Balancing deals with the distribution of activities among the workstations so that there will be maximum utilization of human resources and facilities without disturbing the work sequence. Research works reported in the literature mainly deals with minimization of idle time i.e...
متن کاملA New Balancing and Ranking Method based on Hesitant Fuzzy Sets for Solving Decision-making Problems under Uncertainty
The purpose of this paper is to extend a new balancing and ranking method to handle uncertainty for a multiple attribute analysis under a hesitant fuzzy environment. The presented hesitant fuzzy balancing and ranking (HF-BR) method does not require attributes’ weights through the process of multiple attribute decision making (MADM) under hesitant conditions. For the rating of possible alternati...
متن کاملMixed-Model Assembly Line Balancing with Considering Reliability
This paper presents a multi-objective simulated annealing algorithm for the mixed-model assembly line balancing with stochastic processing times. Since, the stochastic task times may have effects on the bottlenecks of a system, maximizing the weighted line efficiency (equivalent to the minimizing the number of station), minimizing the weighted smoothness index and maximizing the system reliabil...
متن کاملA Multi-Objective Particle Swarm Optimization for Mixed-Model Assembly Line Balancing with Different Skilled Workers
This paper presents a multi-objective Particle Swarm Optimization (PSO) algorithm for worker assignment and mixed-model assembly line balancing problem when task times depend on the worker’s skill level. The objectives of this model are minimization of the number of stations (equivalent to the maximization of the weighted line efficiency), minimization of the weighted smoothness index and minim...
متن کاملA Hybrid Unconscious Search Algorithm for Mixed-model Assembly Line Balancing Problem with SDST, Parallel Workstation and Learning Effect
Due to the variety of products, simultaneous production of different models has an important role in production systems. Moreover, considering the realistic constraints in designing production lines attracted a lot of attentions in recent researches. Since the assembly line balancing problem is NP-hard, efficient methods are needed to solve this kind of problems. In this study, a new hybrid met...
متن کامل